Analytics utilisation means that spotting unusual actions in streaming data has been simplified considerably. Detecting unusual events in data patterns is called Anomaly Detection, meaning trends can be tracked and potentially hazardous situations noticed instantaneously, minimising the risk of further attacks. In this article, we’ll cover a few different examples and explore the necessity of anomaly detection, and the differences between traditional statistical methods and machine learning methods.
Detecting anomalies - different ideologies Initially, tracking anomalies heavily utilised traditional statistical methods; tracking data streams > creating correlations > reporting on the unseen values.
Once Machine Learning and Artificial Intelligence came along, this evolved rapidly.
Using a machine, all historic data is collected and from that, the information and correlations are extracted and based on previous behaviours, a model of the data is created. By comparing real-time data to the predictive model, anomalous behaviour is detected easily, and the predictive model becomes increasingly exact by constantly learning from accurate data input.
Traditional statistics cannot be mass applied to applications, instead, different data models require different statistical approaches to ensure capable anomaly detection – continuous learning is a complex issue for traditional ML techniques and needs to be manually monitored and updated to ensure the latest updates. As data patterns change, models have to be manually adjusted using new training data.
A simple example to imagine which showcases the simplistic usage of anomaly detection would be the following; A company tracks logins to their website using bespoke reporting software, and have a steady history of locations, times, and longevity of visits. All logins made to this site are:
Within the UK,
Occur between 09:00 – 17:00 GMT
Last an average of 45 minutes.
When a login is registered matching all of the above criteria, expect the fact their browsing system varies from the trend, traditional statistical methods would accept this as an ordinary browsing session – machine learning algorithms would notice the anomalous browser and flag as a potential issue. This session could then be halted or restricted automatically by the use of cybersecurity software to ensure minimal risk.
Let’s apply this type of anomaly detection to a Big Data environment and focus on a few of the different types of anomalies that can occur within streaming data.
1- If company sales were being tracked using analytical software, there would be a steady pattern of data from previous sales. If sales suddenly drop or grow to a previously unattained amount, this would be considered anomalous and would either be utilised to improve sales and profit, resulting in positive ROI or would be detected as potential fraud, initiating an investigation to ensure transparency and improve cybersecurity measures.
2- A specific email address is created to send internal data emails only. All outgoing emails are tracked to ensure data confidentiality, when there is a sudden spike in differentiating data sets being emailed out to a specific individual who doesn’t work within the applicable department, This would be classed as a highly sensitive anomaly and would be flagged by the system immediately to help mitigate a potential data leak.
Anomaly detection is usually utilised within all companies, even to a rudimentary level, but it has much greater uses, especially as the data protection world and anomalous behaviour is evolving at such a high rate.
By utilising analytics to target phishing activity, malware behaviours and associated tactics, techniques, and procedures (TTPs) awareness of these threats is increased, and organisations become better prepared for when they are likely to encounter a threat.
If you’re interested in utilising machine learning for your organisation specifically, please get in touch > info@discovranalytics.co
50/60 Station Road,
Cambridge,
CB1 2JH,
UK
© Discovr Analytics Limited 2020 All rights reserved.